natural proof
Distributional PAC-Learning from Nisan's Natural Proofs
Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for $\Lambda$ imply efficient algorithms for learning $\Lambda$-circuits, but only over \textit{the uniform distribution}, with \textit{membership queries}, and provided $\AC^0[p] \subseteq \Lambda$. We consider whether this implication can be generalized to $\Lambda \not\supseteq \AC^0[p]$, and to learning algorithms which use only random examples and learn over arbitrary example distributions (Valiant's PAC-learning model). We first observe that, if, for any circuit class $\Lambda$, there is an implication from natural proofs for $\Lambda$ to PAC-learning for $\Lambda$, then standard assumptions from lattice-based cryptography do not hold. In particular, we observe that depth-2 majority circuits are a (conditional) counter example to the implication, since Nisan (1993) gave a natural proof, but Klivans and Sherstov (2009) showed hardness of PAC-learning under lattice-based assumptions. We thus ask: what learning algorithms can we reasonably expect to follow from Nisan's natural proofs? Our main result is that all natural proofs arising from a type of communication complexity argument, including Nisan's, imply PAC-learning algorithms in a new \textit{distributional} variant (i.e., an ``average-case'' relaxation) of Valiant's PAC model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al. (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being \textit{boosting-friendly}. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques
Agnostic learning [Hau92, KSS92] is an important generalization of PAC-learning [Val84]. Agnostic learning is meant to more accurately capture a common approach to machine learning, where a predefined set of functions is explored in order to find the one achieving the least error on a set of data produced by some totally unknown process. Thus, roughly speaking, the objective of an agnostic learning algorithm for a complexity class Λ is to output a hypothesis h whose error in approximating an arbitrary concept is nearly as small as that of the best possible hypothesis within Λ. The class Λ is referred to as the touchstone class. Designing computationally efficient (i.e., polynomial time) agnostic learning algorithms for expressive touchstone classes has historically been relatively hard. Even extremely simple touchstone classes such as parity functions are believed to be computationally hard to learn in the agnostic model [BFKL93]. Some positive results exist, however, including for piecewise functions [KSS92], restricted fan-in two-layer neural nets [Lee96], geometric patterns [GKS97], decision trees, [GKK08], and halfspaces [KKMS08]. If we take some combination of the common relaxations considered in computational learning theory, such as access to membership queries, distribution-specific learning, or super-polynomial runtime, more positive results become known. For instance, the famed polynomial time agnostic learning algorithm for parity functions due to [GL89] (also referred to sometimes as the KM algorithm after [KM91]), uses membership queries and requires a uniform distribution over unlabelled examples.